On Online Control of False Discovery Rate
نویسندگان
چکیده
Multiple hypotheses testing is a core problem in statistical inference and arises in almost every scientific field. Given a sequence of null hypotheses H(n) = (H1, . . . ,Hn), Benjamini and Hochberg [BH95] introduced the false discovery rate (FDR), which is the expected proportion of false positives among rejected null hypotheses, and proposed a testing procedure that controls FDR below a pre-assigned significance level. They also proposed a different criterion, called mFDR, which does not control a property of the realized set of tests; rather it controls the ratio of expected number of false discoveries to the expected number of discoveries. In this paper, we propose two procedures for multiple hypotheses testing that we will call Lond and Lord . These procedures control FDR and mFDR in an online manner. Concretely, we consider an ordered –possibly infinite– sequence of null hypotheses H = (H1, H2, H3, . . . ) where, at each step i, the statistician must decide whether to reject hypothesis Hi having access only to the previous decisions. To the best of our knowledge, our work is the first that controls FDR in this setting. This model was introduced by Foster and Stine [FS07] whose alpha-investing rule only controls mFDR in online manner. In order to compare different procedures, we develop lower bounds on the total discovery rate under the mixture model where each null hypothesis is truly false with probability ε, for a fixed arbitrary ε, independently of others. Conditional on the set of true null hypotheses, p-values are independent, and iid according to some non-uniform distribution for the non-null hypotheses. Under this model, we prove that both Lond and Lord have nearly linear number of discoveries. We further propose an adjustment to Lond to address arbitrary correlation among the p-values. Finally, we evaluate the performance of our procedures on both synthetic and real data comparing them with alpha-investing rule, Benjamin-Hochberg method and a Bonferroni procedure.
منابع مشابه
The False Discovery Rate in Simultaneous Fisher and Adjusted Permutation Hypothesis Testing on Microarray Data
Background and Objectives: In recent years, new technologies have led to produce a large amount of data and in the field of biology, microarray technology has also dramatically developed. Meanwhile, the Fisher test is used to compare the control group with two or more experimental groups and also to detect the differentially expressed genes. In this study, the false discovery rate was investiga...
متن کاملBotOnus: an online unsupervised method for Botnet detection
Botnets are recognized as one of the most dangerous threats to the Internet infrastructure. They are used for malicious activities such as launching distributed denial of service attacks, sending spam, and leaking personal information. Existing botnet detection methods produce a number of good ideas, but they are far from complete yet, since most of them cannot detect botnets in an early stage ...
متن کاملOnline Rules for Control of False Discovery Rate and False Discovery Exceedance
Multiple hypothesis testing is a core problem in statistical inference and arises in almost every scientific field. Given a set of null hypotheses H(n) = (H1, . . . ,Hn), Benjamini and Hochberg [BH95] introduced the false discovery rate (FDR), which is the expected proportion of false positives among rejected null hypotheses, and proposed a testing procedure that controls FDR below a pre-assign...
متن کاملConsidering dependence among genes and markers for false discovery control in eQTL mapping
MOTIVATION Multiple comparison adjustment is a significant and challenging statistical issue in large-scale biological studies. In previous studies, dependence among genes is largely ignored. However, such dependence may be strong for some genomic-scale studies such as genetical genomics [also called expression quantitative trait loci (eQTL) mapping] in which thousands of genes are treated as q...
متن کاملOnline control of the false discovery rate with decaying memory
In the online multiple testing problem, p-values corresponding to different null hypotheses are observed one by one, and the decision of whether or not to reject the current hypothesis must be made immediately, after which the next p-value is observed. Alpha-investing algorithms to control the false discovery rate (FDR), formulated by Foster and Stine, have been generalized and applied to many ...
متن کاملA note on the false discovery rate of novel peptides in proteogenomics
MOTIVATION Proteogenomics has been well accepted as a tool to discover novel genes. In most conventional proteogenomic studies, a global false discovery rate is used to filter out false positives for identifying credible novel peptides. However, it has been found that the actual level of false positives in novel peptides is often out of control and behaves differently for different genomes. R...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1502.06197 شماره
صفحات -
تاریخ انتشار 2015